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Abstract 

We present the results of an empirical study of the performance of 
the QR and Toda eigenvalue algorithms on random symmetric matri- 

»^h , ces. The random matrices are chosen from six ensembles, four of which 

d ' lie in the Wigner class. We observe a form of universality for the defla- 

tion time statistics for random matrices within the Wigner class: for 
these ensembles, the empirical distribution of a normalized deflation 
time is found to collapse onto a curve that depends only on the algo- 
rithm, but not on the matrix size or deflation tolerance provided the 

ly-j . matrix size is large enough (see Figure 3 and Figure 6). We also pro- 

CO ' vide a quantitative statistical picture of the accelerated convergence of 

^O , the shifted QR algorithm. 
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1 Introduction 



We present the results of a statistical study of the performance of the QR 
and Toda eigenvalue algorithms on random symmetric matrices. Our work is 
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mainly inspired by progress in quantifying the "probability of difficulty" and 
"typical behavior" for several numerical algorithms [7, 14]. This approach 
has led to a deeper understanding of the efficacy of fundamental numerical 
algorithms such as Gaussian elimination and the simplex method [24, 25, 26, 
29]. It has also stimulated new ideas in random matrix theory [10, 11, 13]. 
Testing eigenvalue algorithms with random input continues this effort. In 
related work, we have also studied the performance of a version of the matrix 
sign algorithm. However, these results are of a different character and are 
only summarized here [23]. 

It is natural to study the QR algorithm because of its elegance and 
fundamental practical importance. But in fact all the algorithms we study 
are linked by a common framework. In each case, an initial matrix Lq 
is diagonalized via a sequence of isospectral iterates L m . The gist of the 
framework is that the L m correspond exactly to the flow of a completely 
integrable Hamiltonian system evaluated at integer times. The Hamiltonian 
for these flows is of the form tr G(L) where G is a real-valued function 
defined on an interval. Different choices of G generate different algorithms: 
G{x) = x(\ogx — 1) yields unshifted QR, G(x) = x 2 /2 yields Toda, and 
G(x) = \x\ yields the matrix sign algorithm. 

Initial matrices are drawn from six ensembles that are well-understood in 
random matrix theory. These are listed below in Section 2.4. For many ran- 
dom matrix ensembles, as the size of the matrix grows, the density of eigen- 
values and suitably rescaled fluctuations have limiting distributions that may 
be computed explicitly. Four of the ensembles we study consist of random 
matrices with independent entries subject to the constraint of symmetry. 
The law of these entries is chosen so that these ensembles have the Wigner 
semicircle law as limiting spectral density. We say that these ensembles 
are in the Wigner class. Numerical experiments with these ensembles are 
contrasted with two ensembles that do not belong to the Wigner class. 

In evaluating these algorithms we focus on the statistics of deflation. 
Given a real, symmetric, n x n matrix L and an integer k between 1 and n, 
we write 

L=(*% L ' 2 ], L=( Ll1 ° |, ii 



where L\\ is a k x k block. Let \j and Xj, j = 1, . . . , n, denote the eigenvalues 
of L and L. For a fixed tolerance e > we say that L is deflated to L 
when the off-diagonal block L12 is so small that max,,- \Xj — Xj\ < e. The 
deflation time is the number of iterations m before L m can be deflated by 
a tolerance e > at some index k. The deflation index is this value of k. 



Since the iterative eigenvalue algorithms correspond to Hamiltonian flows, 
there is also a natural notion of deflation time for the Hamiltonian flows (see 
equations (17) and (18) below). 

Let us now explain why deflation serves as a useful measure of the time 
required to compute the eigenvalues of a matrix. The cost of practical com- 
putation requires an analysis of algorithms, hardware and software. In our 
study, we only focus on the algorithm, and "time" is taken to mean the 
number of iterations required for convergence. In our experiments we have 
observed that the Toda algorithm and unshifted QR algorithm deflate a 
matrix at the upper-left or lower-right corner with high probability. In par- 
ticular, the deflation index for unshifted QR is typically n — 1 (Figure 15 and 
Figure 16). As a consequence the deflation time is typically the same as the 
time taken to compute an eigenvalue. We then expect that the time taken 
to compute all eigenvalues with these algorithms is determined by n defla- 
tions. By contrast, we find that the matrix sign algorithm typically deflates 
a matrix in the middle and does not immediately yield any eigenvalues. In- 
stead, these are obtained after a divide-and-conquer procedure that consists 
of approximately log 2 n deflations. Thus, for all three algorithms a finite 
sequence of deflation times determines the number of iterations necessary 
to compute eigenvalues. We must note however, that we do not track all 
deflations in our experiments, only the first. This restriction is necessary to 
keep the datasets manageable as n increases. A more extensive study that 
tracks all deflation times for these algorithms will certainly yield further in- 
teresting information. Finally, as we show in Section 2.6 below, the notion 
of deflation time is also of theoretical value since it is the starting point for 
an analysis of the expected number of iterations for eigenvalue algorithms 
that is similar in spirit to [26]. 

Accelerated convergence under shifts makes the QR algorithm practical. 
Wilkinson showed that the QR algorithm on tridiagonal matrices is cubically 
convergent when the shift is either the entry L nn or the eigenvalue of the 
2x2 lower diagonal corner of the matrix that is closer to L nn [33] (this is 
generically true, see also [18] for a more careful analysis). As we have noted 
above, the unshifted QR algorithm deflates at index n — 1 with very high 
probability. The fact that the Wilkinson shift utilizes the lower 2x2 block 
of the matrix then explains the accelerated convergence of the shifted QR 
algorithm. This is borne out in statistically in our computations. We find 
that the number of iterations required to deflate a random matrix with the 
shifted QR algorithm is almost independent of n for matrices as large as 
190 x 190 (see Figure 13). 

Our main empirical findings concern universal deflation time distribu- 



tions for the QR and Toda algorithms for ensembles in the Wigner class. 
We sample the deflation time for a range of matrix size and deflation toler- 
ance combinations and normalize these empirical distributions to mean zero 
and variance one. The resulting histograms have the same general shape and 
in particular, the same tails on the right side (see in particular Figure 3 and 
Figure 6). We observe exponential tails for unshifted QR and Gaussian tails 
for Toda (Figure 5 and Figure 8) . Universality of the tails is quantified with 
a statistical methodology developed in [2]. The origin of such universality 
is not clear to us. Nor do we understand if our results are connected with 
better understood universality theorems in random matrix theory such as 
those that describe fluctuations in the bulk and at the edge of the spectrum 
for the orthogonal ensembles [20, 30]. Unlike these universality theorems, 
where the mean and variance are known theoretically, in our work the mean 
and variance of the deflation time are computed empirically and we have 
been unable to estimate precisely how these depend on re. It does appear 
however that the mean deflation time is linearly proportional to loge (see 
Figure 9 and Figure 11). 

We now discuss the algorithms and ensembles in greater detail. This is 
followed by a description of the results in Section 3. Implementation of the 
algorithms is discussed briefly in Section 4. 

2 Algorithms, ensembles and deflation statistics 

2.1 Notation 

We denote the space of real, symmetric nxn matrices by Symm(n) and the 
space of real n x m matrices by W axm . Matrices in Symm(n) are denoted 
L or M and the iterates of an eigenvalue algorithm are denoted L m or M m , 

m = 0, 1, 2, We use Q to denote an orthogonal matrix and R an upper 

triangular matrix with positive diagonal entries, typically with reference to 
a QR factorization. We use c(L) to denote the spectrum of L. The spectral 
decomposition of L £ Symm(n) is written L = UhU . Here U denotes 
the orthogonal matrix of eigenvectors of L and A = diag(Ai, A2, • • • , A n ) the 
matrix of eigenvalues. We also use the following standard notation. The 
nxn identity matrix is /; the standard basis in W 1 is (ei, . . . , e n ); the unit 
sphere in R n+1 and its positive orthant are S n and S 1 ™ respectively; the 
symmetric group of order n is S n . 

When L £ Symm(n) is tridiagonal, we denote its diagonal entries by 
(a±, . . . , a n ), and its off-diagonal entries by (61, . . . , 6 ra _i). A Jacobi matrix 
is a tridiagonal matrix with 6j > 0. The space of Jacobi matrices is de- 



noted Jac (n). We use u = U e\ to denote the first row of the matrix of 
eigenvectors. When L G Jac (n), <r(L) is simple and we may assume that 
Ai > A2 > • • • > A n . Moreover, all the components Ui are non-zero and we 
may assume that m > (this is also generically true for L £ Symm(n)). 
Let M denote the manifold {(A, u) € R n x 5"|Ai > A 2 > ... > A n }. 
The essence of the inverse spectral theory for Jacobi matrices is that L can 
be reconstructed if A and u are given. More precisely, the spectral map 
S : Jac (n) — > M defined by L 1— > (A, u) is a diffeomorphism [6, Thm 2]. 

We use the following standard notation for probabilistic notions. The 
phrase independent and identically distributed is abbreviated to iid. A nor- 
mal random variable with mean \x and variance a 2 is denoted J\f(fi,a 2 ); a 
Bernoulli random variable that is ±1 with probability 1/2 is denoted £>; a 
random variable with the x~distribution with parameter k is denoted Xk- 
The notation X ~ Y means that X has the same law as Y. 

2.2 The QR algorithm and Hamiltonian eigenvalue algorithms 

We assume the reader is familiar with the QR algorithm (excellent textbook 
presentations are [8, 15, 31]). In the unshifted QR algorithm the iterates 
M m are generated through QR factorizations and matrix multiplication in 
the "wrong order" : 

QmRm = M m , M m+ i = RmQmi 771 = 0,1,2,... (2) 

The shifted QR algorithm relies on a shift [i m at each step, and the modified 
steps 

^imt^-m = M-m Mm-' j -'"m+1 = -tirn^rn ~r /^m-' j 771 = U, 1, 2, . . . [o) 

Typical shifts are constructed from the lower 2x2 block of M m [15, p. 418]. 

In the early 80's it was discovered that the QR algorithm is intimately 
connected with integrable Hamiltonian systems [5, 6, 22, 27, 28]. We sum- 
marize these results below. An expanded presentation of these connections 
may be found in [4, 23]. A different exposition that explains these ideas in 
a fashion "intrinsic" to numerical linear algebra is [32]. 

Assume G is a piecewise smooth real-valued function defined on an in- 
terval, and set g = G' . If g is defined on a(L), we define g(L) := Ug(A)U T . 
Let M_ denote the strictly lower triangular part of the square matrix M 
and pr t M := M_ — M_ the projection of M onto skew-symmetric matrices. 
We then consider the ordinary differential equation 

L = [ Wi g(L),L\. (4) 



Equation (4) defines a completely integrable Hamiltonian flow on the space 
of symmetric matrices with Hamiltonian H(L) = tr G(L) and symplectic 
structure detailed in [5]. This flow is connected to the unshifted QR algo- 
rithm as follows. 

Theorem 1. Let g be a real-valued function defined on a (La). Then 

(a) The solution to equation (4) with initial condition Lq is an isospectral 
deformation 

L(t) = Q(t) T L Q(t), (5) 

where the orthogonal matrix Q(t) is given by the unique QR factoriza- 
tion 

e tg(L ) = Q( t )R(t), t > 0, (6) 

that has Q(0) = I and depends smoothly on t. 

(b) At integer times m = 0, 1, 2 . . . the solution L(m) satisfies 

e g{L{m)) = Mmj (7) 

where M m is the m-th step of the QR algorithm applied to the initial 
matrix M = e g ( L °\ 

(c) Assume that the spectrum ct(Lq) is simple and that g is injective on 
ct(Lq). Then L^ = lim^oo L(t) is a diagonal matrix consisting of the 
eigenvalues of Lq. 

The case of tridiagonal matrices is of practical and theoretical impor- 
tance. When Lq is tridiagonal, so is L(t), and the flow can be linearized 
using the inverse spectral theory for Jacobi matrices. 

Theorem 2. Assume Lq G Jac (n). Then the solution L(t) to (4) is an 
isospectral deformation L(t) = U(t) AU(t) and the evolution of u(t) = 
U(t) T e\ and L(t) is given explicitly by 

e t9i - K ^u n 

"(*)= ■■ tm in m = s- 1 (\,u(t)). ( 8 ) 

Assume g is injective on <j(Lq). Then 

limL(t) = diag(A CT1 ,...,A cr J, (9) 

t— >oo 

where a £ S n is the permutation such that g(X ffl ) > • • • > g(X a „)- 



Theorem 1 and Theorem 2 may be used to develop numerical schemes. 
The main observation is that each choice of a Hamiltonian corresponds to a 
choice of an algorithm. In particular, we have 

1. The unshifted QR algorithm: g(x) = log(x), G(x) = x(logx — 1) 
and H QR (L) = tr [LlogL - L] [22]. 

2. The Toda algorithm: g(x) = x, G(x) = x 2 /2 and i?Toda(£) = 
TjtrL 2 . In this case, equation (4) describes the evolution of the Toda 
lattice [21]. 

3. The matrix sign algorithm: g{x) = sign(x), G{x) = \x\ and H s [ gn (L) 

tr|L|. 

While every function G defines a Hamiltonian not all choices are equally 
relevant. Since our goal is to find the spectral decomposition of Lq, we 
must assume that Q and A are unknown. But then how are we to compute 
the matrix-valued functions g(L) or e 9 ^ L > efficiently? The choices g{x) = 
log 2 and g{x) = x are special since these give e 9 ^ L > = L and g(L) = L 
respectively. The first choice gives the QR algorithm (strictly speaking a 
branch of the logarithm must be chosen so that (4) is well-defined, but this 
does not affect the QR algorithm since it uses only the factorization step 
in Theorem 1(b)). For the second choice g(x) = x, the vector field (4) is 
faster to compute than the matrix exponential e ^ m ' and it is natural to use 
an ordinary differential equation solver for (4) to diagonalize L. This is the 
essence of the Toda algorithm. 

Our final choice g(x) = sign(x) requires further comment since the ob- 
servation that the matrix sign algorithm is Hamiltonian seems to us to 
be new. Assume zero is not an eigenvalue of Lq and let T,± denote the 
eigenspaces of Lq corresponding to positive and negative eigenvalues respec- 
tively. Consider matrices Q± whose columns form an orthonormal basis for 
S-t respectively. Then the matrices -P+ = Q+Q+ and P_ = Q-QT. are or- 
thogonal projections onto £± respectively and we find sign(Lo) = P+ — P- 
and (/ ± sign(Lo)) /2 = P±. It is immediate that 



c 



tsign(Lo) = & t p+ _ e -tp_^ and Hm e -i e isign(L„) = p+ ^ 



t— >oo 



The projection P + has a rank-revealing QR factorization P + = UoqRoqTI [17, 
Ch. 2.5]. The matrix sign algorithm rests on the fact that with Uqo as above, 



U^Una = L and the matrix 



L = U^LqUoo (11) 



is block-diagonal as in equation (1), where L\\ is k x k with k = dim(S + ). 
Clearly, a(L) = <j(Lq). 

Thus, the procedure to deflate a matrix using the matrix sign algorithm 

is: 

1. Given Lq compute sign(Lo) and hence P + = (I + sign(Lo)) /2. 

2. Compute U^ using a rank-revealing QR decomposition of P+. 

3. Compute L = U^LoU^. 

We note that sign(Lo) can be computed efficiently using a scaled Newton 
iteration and inverse-free modifications of this procedure [1, 17, 19]. The 
complete spectral decomposition of Lq may be determined in a sequence 
of deflation steps. At each stage, the number of iterations required to de- 
flate the matrix depends on the number of iterations required to compute 
sign(Lo). 

From the dynamical point of view, let L(t) denote the solution to (4) 
with g(L) = sign(L). Then it may be shown that for generic initial data 
II = I and lim$_ K30 L(£) = L where L = U^LqUoq is the block-diagonal 
matrix obtained above by the matrix sign algorithm. While this dynamical 
interpretation of the matrix sign algorithm is of theoretical interest, it is not 
clear how to implement it numerically in an effective manner. 

We have not tested the performance of the matrix sign algorithm with 
random input in full generality. Instead, we have tested the deflation be- 
havior of this algorithm in a more restricted setting by first precomputing 
sign(Lo) and then using Theorem 2. These results are not presented in this 
paper: the interested reader is referred to [23]. 

2.3 Deflation criterion 

Consider a symmetric matrix A with eigenvalues Ai > . . . > A n , a symmetric 
matrix B, e > and the perturbed matrix A + eB with eigenvalues Ai(e) > 
. . . A n (e). Standard perturbation theory [8, Thm 5.1] implies 

\K - X t (e)\ < e\\B\\ 2 . (12) 

When deflating Jacobi matrices the perturbation matrix is of the form 

B =Uf)- (i3) 



where the only non-zero entry in E\ k £ W- n ~ > is a one in the upper right 
corner. Clearly, ||-B||2 = 1 in this case. For the deflation of full symmetric 
matrices, the perturbation matrix has the structure 

"-(£?)• 

where again B21 £ R' n ~ ' x , but now all entries of B satisfy \bij\ < 1. In 
this case, one may show that ||-B||2 < \/k(n — k). 

We now define the deflation criterion. If L is a Jacobi matrix define 

e k = b k . (15) 

If L is a full symmetric matrix set 



i k = yk{n — k) max |77iy| (16) 

k<i<n 
l<j<k 

Assume L m is a sequence of iterates (Jacobi or full symmetric) obtained 
through an iterative eigenvalue algorithm. For a given tolerance e > and 
initial matrix Lq we define the deflation time 

t„ j(E (Lo) = min {m \ ik(L m ) < e for some 1 < k < n — 1}. (17) 

For calculations based on the Hamiltonian flow (4) it is more natural to 
consider the real valued deflation time 

T n ,e(Lo) = inf {t > 0| e k (L(t)) < e for some 1 < k < n— 1}. (18) 

The location where the matrix deflates is called the deflation index 

tnA L o) = arg min e k (L {Lo) ). (19) 

l<k<n— 1 

There is an important difference between deflation and the asymptotic 
convergence guaranteed by Theorem 1. While Theorem 1 may be used to 
compute asymptotic rates of convergence as t — > 00 [6, Thm 3], in practice 
the rate of convergence is determined by deflation and transients play an 
important role. We illustrate this with a simple example. 

Fix Ai > A2, let A = diag(Ai, A2) and consider the QR flow on Symm(2) 
with the initial matrix 

^QoAQj, «,-(«* _Zll). (20, 



According to Theorem 2, lim^oo Lit) = A for every 6q. However, if 6q ~ 
7r/2, Lq is a small perturbation of diag(A2, Ai) and in practice, the algorithm 
would immediately deflate and return Lq. But according to Theorem 2, Lit) 
must evolve so that the inital diagonal terms "turn around" and are pre- 
sented in the correct order diag(Ai, A2) as t — > 00 (see (9)). More generally, 
consider A = (Ai,...,A n ) with Ai > A2 > • • • > A n . Each permutation 
a £ S n yields a distinct fixed point A CT = (A^, . . . , A ffn ) for the QR and 
Toda algorithms. In a numerical calculation, an initial condition close to A CT 
is immediately deflated. Alternatively, iterates may pass close to one of the 
permutations A a and again deflation occurs at finite times. However, only 
the equilibrium (Ai, . . . , A n ) attracts generic initial conditions [6]. Thus the 
notion of convergence as t — > 00 and deflation are completely distinct. 

2.4 Ensembles 

We now introduce the six ensembles of random matrices that we will an- 
alyze. For general introductions on random matrices see [3, 12, 20]. The 
simplest way to construct an ensemble of random matrices is to choose en- 
tries independently subject only to the constraint of symmetry. We consider 
four such ensembles: 

1. the Gaussian Orthogonal Ensemble (GOE) (independent entries with 
M tl ~ V2Af{0, 1), Mij ~ AA(0, l),i>j); 

2. the Gaussian Wigner ensemble (iid M^ ~ A/"(0, 1), i > j); 

3. the Bernoulli ensemble (iid M^ ~ B, i > j ); 

4. the Hermite-1 ensemble on Jacobi matrices (iid a^ ~ A/"(0, 1), k = 
1, . . . , n and independent bk ~ Xk, k = 0, . . . , n — 1. ). 

(l)-(3) are ensembles of full symmetric matrices. The distinction between 
(1) and (2) is that the variance of the diagonal and off-diagonal entries of 
matrices in GOE is different to ensure orthogonal invariance. Hermite-1 
is an ensemble of Jacobi matrices obtained by applying the Householder 
tridiagonalization procedure to the GOE ensemble. It is a remarkable fact 
that the entries remain independent under tridiagonalization (this is not 
true when matrices from ensembles (2) and (3) are tridiagonalized) . 

More precisely, a choice of an ensemble of random, symmetric matrices is 
a choice of a probability measure on the space of symmetric matrices. When 
the matrix entries are independent this measure is a product measure. For 
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example, the measure corresponding to GOE has density 

P GOE (M) = 2 2n / 2 (27r)- n(n+1 ^ 4 e'-^ M2 \ (21) 

For all these ensembles, while the matrix entries are independent, the eigen- 
values are not. The joint density of eigenvalues for GOE and Hermite-1 
may be computed explicitly and is given by the determinantal formula [20, 
Ch.3] 

/i(A) = -i-|A n (A)|e- J ^ ) A n (\) = H(\ i -\ j ). (22) 

The normalization constant Z n may be computed explicitly. By contrast, 
while the analogues of (21) for ensembles (2) and (3) are clear, there is no 
explicit analogue for (22). 

We say that ensembles (1)— (4) are in the Wigner class because they have 
the same limiting spectral distribution. For each of these ensembles 

1 f b 

lim -#{Xi £ Vn(a,b)} = / u(x)dx, (23) 

ra->oo n J a 

where u(x) denotes the density of the Wigner semicircle law 

v {x) = ^V4-x 2 1 N<2 . (24) 

We will contrast our results on these ensembles with two ensembles of 
Jacobi matrices that are not in the Wigner class. These are: 

5. The uniform doubly stochastic Jacobi ensemble (UDSJ). 

6. The Jacobi uniform ensemble (JUE). 

Doubly stochastic Jacobi matrices of dimension n x n form a compact 
polytope in W 1 ^ 1 which we can equip with its uniform measure [9]. This 
is the UDSJ ensemble. We can approximately sample from this ensemble 
using a Gibbs sampler. 

JUE is defined using the spectral theory for Jac (n). Since we may 
describe Jacobi matrices by their spectral data (A, u), a probability measure 
on the spectral data pulls back under 5 _1 to a probability measure on 
Jac (n). For JUE, we replace (22) with eigenvalues chosen independently 
and uniformly on an interval and u distributed uniformly on the orthant 
S'? - ■ In our numerical simulations we assume the eigenvalues are uniformly 
distributed on [— 2y/n, 2y/n\ because this interval corresponds to the support 

11 



of the semicircle law and allows a comparison between JUE and ensembles 
in the Wigner class. A particularly important aspect of JUE is that the 
eigenvalues do not repel one another. This strongly affects the statistics of 
T n>t as shown below. 

2.5 The normalized deflation time 

We have now defined the algorithms, ensembles and deflation criterion. For 
a given algorithm and ensemble, T n ^ e {L) and i n ^ e {L) are random variables 
that depends on the random initial matrix L and e > 0. We explore the 
empirical distributions of T n>e and t n ^ in simulations. Our main empirical 
finding is that for each algorithm these empirical distributions collapse into 
a universal distribution for Wigner type ensembles. More precisely, let [i n ^ 
and u\ e denote the empirically determined mean and variance of r n>e (L) 
for a particular algorithm and ensemble. Our simulations suggest that the 
normalized deflation time 

T ^ = TVg ~ f^e (25) 

converges in distribution as n — > oo and e — > and that the limit is the same 
for ensembles in the Wigner class (see Figure 3 and Figure 6)). Both /i n ^ 
and a n>e are computed empirically. Our numerical calculations also suggest 
that [i n ^ ~ C|loge| for all ensembles in the Wigner class (see Figure 9 
and Figure 11). In order to prove convergence in distribution of T n>t it is 
first necessary to estimate the mean and variance of r. We present below a 
calculation of ^2,e that illustrates the subtle role of eigenvalue repulsion. 

2.6 The scaling of the expected deflation time 

In this section we estimate the expected deflation time of the Toda flow on 
Symm(2). We show that 

M2,e,GOE~ C*| log e|, but /U 2 , e ,juE ~ C|loge| 2 , e -)• 0. (26) 

The interval of support for the JUE density is chosen here to be [—1,1]. 
This choice only affects the prefactor C, not the term | logej 2 . 

In order to establish these asymptotics, we first determine the deflation 
time T e as a function of the initial condition (for brevity we write r e for 
T2, e since n = 2 is fixed). Since M(t) € Symm(2) we may write M = 
U(t)AU(t) T , where A = diag(Ai, A 2 ), Ai > A 2 , and 

U(t)-( C ° Sm S[nm ) (27) 

U[t) ~ \ sm0(t) - cos 6{t) J ' { > 

12 



Note that mi 2 > corresponds to 6 G (0, tt/2). We use Theorem 2 to obtain 

m 12 (t) = (M - A 2 )cos^)sin^) = (A, - A 2 ) • x + ^^ t J 6q (28) 

Here 6q = 0(0)- Now we set mi 2 (r e ) = e and solve to find 


log tan #o — log 



(Ai 



A 2 )r e 



Ai— A2 

2e 



(Ai-A 2 )2 



1 



The asymptotics of r e are easily determined. We have 

(Ai - A 2 )r e ~ - log e + log tan + log(Ai - A 2 ) 



mi2(0) < e 
mi 2 (0) > e. 

(29) 



0. 



(30) 



To compute the mean deflation time for GOE and JUE we first change to 
spectral variables. As noted above, the spectral map S is a diffeomorphism 
between the set of 2 x 2 symmetric matrices with m\i > and the set 
{Ai > A 2 } x (0, vr/2). The Jacobian of this transformation is Ai — A 2 so that 



dmndm22dmi2 = (Ai — A 2 )<iAi(iA 2 <i# 
The mean deflation time for GOE is then given by 

1 pod />\i fn/2 



(31) 



M2,e,GOE 



z t r 

A\ J-oo J-oo JO 

For JUE, the eigenvalues are chosen uniformly from [— 1, 1] and we find 

j pl pXj fir/2 



r e (Ai,A 2 ,60e~( Al+A 2)/ 4 (Ai - X 2 )dX 1 dX 2 dd. 

(32) 



A t 2,e,JUE 



z. 



2 J-\ J-\ JO 



t £ (Ai, X 2 ,9)dX 2 dX 2 d9. 



(33) 



Here Z\ and Z2 are normalizing constants for these probability densities. 

The asymptotic behavior of (30), combined with (32) and (33), suggests 
the following leading order behavior as e — > 0: 



^2,e,GOE 



M2,e,JUE 



logej 
Zx 

logej 

Z2 



00 r\\ /•tt/2 



00 J — 00 JO 

1 Mi /•?r/2 



-U-l JO 



-(A2+A|)/4 dAidA2 ^ _ Ci|loge| , (34) 
l mi2>6 dAidA 2 d0 ~ C 2 | log<(ffe) 



Ai — A 2 



Here Cj denote constants that may be computed explicitly. The second 
integral is divergent without the cut-off t mi2>e : the cut-off gives rise to an 
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additional factor of | log e| . These formal estimates may be made rigorous 
with a little effort. 

The analogous calculations for M{t) £ Jac (n) are quite subtle. For Ja- 
cobi matrices deflation occurs when M(t) approaches the boundary 9Jac (n) 
of Jac {n) (see for example [6, Figs. 6 and 7]). A theoretical analysis of such 
deflation, which we have not carried out yet, is a significant challenge as it 
requires a detailed understanding of the geometry of both the flow and the 
initial probability distribution in the vicinity of <9Jac (n) in high dimensions. 
For this reason, we have used the empirical mean fi n>e and variance a^ e in 
defining the normalized deflation time in (25). 

3 Results 

We generated a large number (typically 5000-10,000) of samples of the defla- 
tion time and the deflation index for each choice of the following parameters: 

1. an eigenvalue algorithm (QR, Toda, matrix sign). 

2. a random matrix ensemble. 

3. matrix size n (typically ranging from 10, 30, . . . 190.) 

4. tolerance e (typically 10~ fc , k = 2,4,6,8). 

We present a representative sample of our main results. Further statistical 
tests, figures and tables that amplify our conclusions may be found in [23]. 

3.1 Unsealed deflation time statistics for GOE 

We first present deflation time statistics for T n ^ for a fixed ensemble (GOE) 
for both QR and Toda algorithms. The statistics of T n ^ for the QR algorithm 
are shown in Figure 1. Similar statistics for the Toda algorithm are shown 
in Figure 2. These figures reflect the typical dependence of these algorithms 
on n and e for ensembles (1)— (6). Similar statistics for other ensembles may 
be found in [23, chapter 7]. In all cases, we observe that the histograms for 
the QR algorithm are relatively insensitive to n and shift to the right as e 
decreases. The histograms for the Toda algorithm shift to the right as n 
increases and e decreases as discussed below. 
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Figure 1: The QR algorithm applied to GOE. (a) Histogram for the 
empirical frequency of T n>€ as n ranges from 10, 30, . . . , 190 for a fixed defla- 
tion tolerance e = 10 -8 . The curves (there are 10 of them plotted one on top 
of another) do not depend significantly on n. (b) Histogram for empirical 
frequency of r nj£ when e = 10 , k = 2, 4, 6, 8 for fixed matrix size n = 190. 
The curves move to the right as e decreases. 

3.2 Normalized deflation time and universality for the Wigner 
class 

We now present results that show the collapse of all data onto universal 
curves depending only on the algorithm under the rescaling (25). The statis- 
tics of the empirical mean [i n ^ and standard deviation o n ^ are discussed a 
little later. The empirical distribution of the normalized deflation time T n>e 
for the QR algorithm with intial data from the Wigner ensembles is shown 
in Figure 3. All the data contained in Figure 1 collapse onto the single curve 
seen in Figure 3(a). Analogous data for the other Wigner class ensembles 
(2)-(4) collapse onto the same universal curve. The normalized deflation 
time distributions for UDSJ and JUE are shown in Figure 7. While we again 
observe a collapse of the data, it is not onto the curve of Figure 3(a). This 
contrast is amplified in the comparison of the tails of the normalized defla- 
tion time (see Figure 5). QQ plots that directly compare the histograms of 
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Figure 2: The Toda algorithm applied to GOE. (a) Histogram for em- 
pirical frequency of r n>e as n ranges from 10, 30, . . . , 190 for a fixed deflation 
tolerance e = 10~ 8 . The curves drift to the right as n increases, (b) His- 
togram for empirical frequency of T n>e when e = 10~ fc , k = 2, 4, 6, 8 for fixed 
matrix size n = 190. The curves move to the right as e decreases. 

these distributions may be found in [23]. 

We have also observed a similar universality for the Toda algorithm. 
The empirical distribution of the normalized deflation time for the Wigner 
ensembles is shown in Figure 6. Again, all the data contained in Figure 2 
collapse onto the single curve seen in Figure 6(a). Further, analogous data 
for the other Wigner ensembles (2)-(4) collapse onto the same curve. The 
data for UDS J and JUE collapse under normalization, but not onto the same 
distribution (see Figure 4 and Figure 8). 

Remark 1. We note that for both the QR and Toda algorithms the limiting 
distribution of the normalized deflation time T n>e for UDS J and JUE is dis- 
tinct from that of ensembles in the Wigner class. This raises the interesting 
issue in random matrix theory whether UDS J and JUE are in the same uni- 
versality class as Wigner ensembles and invariant ensembles. As JUE does 
not have eigenvalue repulsion built in, this is unlikely to be the case. 
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3.3 Universal tails for deflation times 

We used a hypothesis testing approach to quantify the statement that the 
rescaled deflation time has an exponential tail for QR and a Gaussian tail 
for Toda. Our approach is modeled on the methodology of [2]. Given defla- 
tion time data D we perform maximum likelihood estimation of parameters 
for distribution families conditioned on observing only values above a cutoff 
value x m i n (.D) and use a semiparametric approach to compute p-values for 
these parameters. Based on D and our parameter estimate, we compute 
resampled data sets and a modified Kolmogorov-Smirnov statistic measur- 
ing the distance between the empirical distribution function and the ones 
resulting from our maximum likelihood estimates. The semiparametric p— 
value is given as the proportion of instances that the resampled data sets 
yield larger modified KS statistics than the original. If this p-value is large 
we accept the hypothesis that the original data set has in fact the proposed 
decay in the right tail. 

We applied this approach with the Gaussian, Exponential, Weibull and 
Gamma families. We found that the exponential tails fit the QR runtime 
data especially well for small values of the deflation tolerance. The fit of 
the Toda runtime data to Gaussian tails is very compelling across most 
experimental regimes. Direct pictorial comparisons of the normalized Toda 
runtimes with the standard normal as well as normalized QR runtimes with 
normalized Gamma distributions are shown in Figure 5. Further details of 
the statistical tests may be found in [23]. 

3.4 The dependence of fi ne and a ne on n and e 

We used linear regression to express [i n ^ and a n>e as functions of log e and 
n. Only the best fits are reported here. The data for the QR algorithm was 
matched very well by 

Mn,e ~ a + ain + a 2 loge, (36) 

cr n ,e ~ b + bin + b 2 log e. (37) 

This regression is compared visually with the numerical data in Figure 9 
and Figure 10. The regression parameters are tabulated in Table 1 and 
Table 2. Since the means and variances do not visually appear to depend on 
n for ensembles (l)-(3) we have also included the p-values for the i— test of 
the hypothesis that the coefficient corresponding to the dimension is zero. 
Note that fi n>e and a n>e are almost identical for the ensembles (l)-(3) in 
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Table 1: Regression parameters for fi ne for the unshifted QR algorithm 



Ensemble 


ao 


a\ 


02 


P-value a\ 


GOE, Gaussian Wigner, Bernoulli 


1.96824 


0.0004690 


-1.0263649 


0.0095 


Hermite-1 


.802338 


-.004554 


-1.042907 


< 2 • 10~ i(i 


UDSJ 


0.7648330 


-0.0072921 


-1.0916354 


4.16 • 10~ 8 


JUE 


1.844126 


-0.003467 


-1.276037 


0.0256 



Table 2: Regression parameters for a ne for the unshifted QR algorithm 



Ensemble 


b 


h 


b 2 


P-value b\ 


GOE, Gaussian Wigner, Bernoulli 


1.2799509 


0.0005311 


-0.5854859 


0.0118 


Hermite-1 


0.442622 


-.003329 


-.617517 


8-10~ i5 


UDSJ 


1.066713 


-0.007584 


-0.658920 


0.000353 


JUE 


2.0044243 


-0.0026034 


-0.7961700 


0.000185 



the Wigner class, while for the Hermite-1 initial data both statistics have a 
slightly larger value. 

The deflation time depends more strongly on n for the Toda algorithm. 
We explored several regressions but our results for Toda are more ambiguous 
than for QR. We found that the non- Wigner ensembles (UDSJ and JUE) 
could be fit with an expression of the form (36)-(37). However, the Wigner 
class ensembles were better suited to the regression 



M«,e ~ a o + ai log n + a 2 log e 
&n,e ^b + b 1 log n + b 2 log e 



(38) 
(39) 



The results of this regression are presented in Figure 11, Figure 12, Table 3 
and Table 4. 

3.5 Deflation index statistics and the effect of the Wilkinson 
shift 

The remarkable acceleration of QR by shifting is of course well known. Our 
experiments provide a quantitative statistical picture for the efficacy of the 
shift. Figure 13 shows that the deflation time is sharply reduced by the 
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Table 3: Regression parameters for \x n ^ for the Toda algorithm. UDSJ and 
JUE are fit to (36)-(37) and the Wigner class ensembles are fit to (38)-(39). 



Ensemble 


a 


a x 


a 2 


GOE, Gaussian Wigner, Bernoulli 


-6.0669 


1.2888 


-0.7302 


Hermite-1 


-7.0273 


1.6795 


-0.7708 


UDSJ 


-34.01514 


0.02984 


-6.60133 


JUE 


-2.78614 


0.05318 


-0.74315 



Table 4: Regression parameters for a n ^ for the Toda algorithm. UDSJ and 
JUE are fit to (36)— (37) and the Wigner class ensembles are fit to (38)-(39). 



Ensemble 


b 


h 


b 2 


GOE, Gaussian Wigner, Bernoulli 


-1.6532 


0.3347 


-0.1569 


Hermite-1 


-2.1233 


0.6324 


-0.1727 


UDSJ 


-16.46367 


0.04845 


-3.10561 


JUE 


0.97525 


0.04068 


0.01451 
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Wilkinson shift. Figure 14 shows that the standard deviation of the deflation 
time is also sharply reduced by the shift. Deflation takes only a few iterations 
independent of the size of the matrix. This is in sharp contrast with the 
unshifted QR algorithm. 

An explanation for the speed-up lies in the statistics of the deflation 
index shown in Figure 15. We find that the unshifted QR algorithm deflates 
at the bottom right corner of the matrix with high probability. Since the 
Wilkinson shift uses only the 2x2 lower-right block of the matrix, small 
off-diagonal terms in this block accelerate the unshifted algorithm greatly. 
In contrast with the QR algorithm, the Toda algorithm deflates at both the 
upper-left and lower-right corner of the matrix (Figure 16). Note though 
that deflation is still predominantly at the corners of the matrix. Similar 
statistics for other ensembles may be found in [23]. 

4 Methods and implementation 

The algorithms were implemented in Python and run on a computing clus- 
ter using the module mpi4py. For numerical computations we relied on 
the scipy module except in the case of the RKPW spectral reconstruc- 
tion procedure [16] which was implemented in C. Our simulation strategy 
was to generate a number N of samples for each (e, n)-pair of tolerances 
e € {10~ fc : k = 2, 4, 6, 8} and matrix dimensions n £ {10, 30, ... , 190}. One 
initial matrix sample of size ni x rij is used to generate deflation time and 
deflation index samples for all pairs (e, rij), where e is in our list of tolerances. 
To do this we advance the matrix using the algorithm under consideration 
until we undercut each of the tolerances in the list and save the corre- 
sponding statistics along the way. Typically for each (e, n)-combination we 
generate between 1000 and 5000 samples. In the following we present a 
short summary of the implementation strategies chosen for the individual 
algorithms. 

4.1 QR algorithm 

Our simulation code uses the QR decomposition and matrix multiplication 
methods provided by scipy for the case of full symmetric matrices. For Ja- 
cobi matrices we implemented the efficient (unshifted) QR step presented for 
example in [15]. We augment these implementations to include the Wilkin- 
son shift by subtracting (adding) the shift value before (after) the QR step 
respectively. 
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4.2 Toda algorithm 

Both Jacobi and full symmetric matrices are treated similarly for this al- 
gorithm. The implementation uses the QR representation (8) to generate 
Toda steps T n as follows: 

M k = exp(T fe ) = Q k R k , (40) 

M k+1 = R k Q k , (41) 

T k+1 = log(M k+1 ). (42) 

Our implementation uses scipy routines for the matrix exponentials and 
matrix logarithms. Note that in general the matrix exponential of a Jacobi 
matrix is full symmetric, scipy is also used for the QR decomposition and 
standard matrix multiplication routines for the reverse order multiplication. 
Note that we do not use an ordinary differential equation solver to solve 
(4) and diagonalize the matrix as proposed in [6]. This is because our goal 
here is not to develop a competitive numerical scheme, but to compute 
reliable statistics of the deflation time for different algorithms. The above 
numerical scheme based on QR factorization was validated against both an 
ordinary differential equation solver based method and the use of the explicit 
solution (8) with the RKPW implementation of the inverse spectral map. 
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Figure 3: Universal deflation time statistics for the QR algorithm 
applied to the Wigner class. Empirical deflation time normalized as 
in (25) for e = l(T fe , k = 2, 4, 6, 8 and n ranging from 10, 30, ... , 190. The 
random matrix ensembles are (a) GOE; (b) Hermite-1; (c) Gaussian Wigner; 
and (d) Bernoulli. Each of the figures (a), (b), (c), and (d) is obtained by 
rescaling the data of 10 x 4 fixed-n and fixed-e histograms and plotting them 
together. All these data are observed to collapse onto one universal curve. 
Plotting all 160 histograms in one ^figure (as in Figure 5 below) further 
demonstrates the universality of the deflation algorithm. 
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Figure 4: The QR algorithm applied to non-Wigner ensembles. Nor- 
malized empirical deflation time distributions for the QR algorithm with 
e = 10 , k = 2,4,6,8 and n ranging from 10, 30, . . . , 190. The random 
matrix ensembles are (a) UDSJ and (b) JUE. Each figure contains the nor- 
malized empirical data of 40 fixed- n and fixed-e histograms. All these data 
are observed to collapse onto a single curve. However, these curves are not 
the same for UDSJ and JUE and neither of these coincides with the curve 
for Wigner data shown in Figure 3. 
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Figure 5: Exponential tail for the QR algorithm. Histograms of the 
normalized deflation time for the QR algorithms on a logarithmic scale, (a) 
Wigner data. Empirical normalized deflation time distributions from all 
160 histograms of Wigner class initial data (black dots) are compared with 
a gamma distribution with parameters k = 2 and 9=1 shifted to mean zero 
(gray line), (b) non- Wigner data. Empirical normalized deflation time 
distributions from 40 GOE histograms (black dots) is contrasted with data 
from 40 UDSJ histograms (gray squares). 
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Figure 6: Universal deflation time statistics for the Toda algorithm 
applied to the Wigner class. Empirical deflation time normalized as 
in (25) for e = l(T fe , k = 2, 4, 6, 8 and n ranging from 10, 30, ... , 190. The 
random matrix ensembles are (a) GOE; (b) Hermite-1; (c) Gaussian Wigner; 
and (d) Bernoulli. Again each of the figures (a), (b), (c), and (d) is obtained 
by rescaling the data of 40 fixed-n and fixed-e histograms and plotting them 
together. All these data are observed to collapse onto one universal curve. 
This universality is amplified in Figure 8 below. 
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Figure 7: The Toda algorithm applied to non-Wigner ensembles. 

Normalized empirical deflation time distributions for the Toda algorithm 
with e = 10 , k = 2, 4, 6, 8 and n ranging from 10, 30, . . . , 190. The ran- 
dom matrix ensembles are (a) UDSJ and (b) JUE. Each figure contains the 
normalized empirical data of 40 fixed-n and fixed-e histograms. All these 
data are observed to collapse onto a single curve. However, these curves are 
not the same for UDSJ and JUE and neither of these coincides with the 
curve for Wigner data shown in Figure 6 
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Figure 8: Gaussian tail for the Toda algorithm. Histograms of the 
normalized deflation time for the QR algorithms on a logarithmic scale, (a) 
Wigner data. Empirical normalized deflation time distributions from all 
160 histograms of Wigner class initial data (black dots) are compared with 
a standard normal distribution (gray line), (b) non- Wigner data. Empir- 
ical normalized deflation time distributions from 40 GOE histograms (black 
dots) is contrasted with data from 40 UDSJ histograms (gray squares). 
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Figure 9: Mean deflation time /% 6 for the QR algorithm. Empirical 
average of the deflation time for e = W~ k , k = 2, 4, 6, 8 and n in the range 
10, 30, ... , 190. (a) Wigner class initial data. The full lines are the 
empirical mean \x n ^ for GOE, Gaussian Wigner and Bernoulli ensembles. 
Note that they seem to align well with one another. The circles are the 
values obtained from the regression estimate (36) with the parameters listed 
in Table 1. The dashed line and triangles represent empirical data and the 
regression respectively for the Hermite-1 ensemble, (b) JUE and UDSJ 
initial data. The full line and dashed line are the empirical mean n n ^ for 
UDSJ and JUE data respectively. The circles and triangles are the regression 
estimates for UDSJ and JUE respectively. As e decreases the curves move up 
monotonically. The regression is not applied to the lowest curve in (b) since 
e = 0.01 is sufficiently large that several matrices deflate instantaneously. 
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Figure 10: Standard deviation a n>e of the deflation time for the QR 
algorithm, (a) Ensembles in the Wigner class, (b) JUE and UDSJ. Legend 
as in Figure 9 with regression parameters from Table 2. 
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Figure 11: Mean deflation time /i ne for the Toda algorithm. Mean 
deflation time distributions of the Toda algorithm for initial data described 
in Figure 9. (a) Wigner class initial data and (b) JUE and UDSJ initial 
data. Legend as in Figure 9 and regression parameters as in Table 3. 
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Figure 12: Standard deviation a ne of the deflation time for the Toda 
algorithm, (a) Ensembles in the Wigner class, (b) JUE and UDSJ. Legend 
as in Figure 9 and regression parameters as in Table 4. 
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Figure 13: The effect of the Wilkinson shift. Mean deflation time /j, n>e 
for the QR algorithm with the Wilkinson shift, (a) Wigner class ensembles, 
(b) JUE and UDSJ. Empirical data are generated for e = 10 -2 , ..., 10 -8 and 
n = 20, ...,190. The empirical data and a regression of the form (36) are 
presented in the same line-styles as Figure 9. Observe that (i n;e is almost 
independent of n and that the curves move upwards as e decreases as in 
Figure 9, but that the scale of the ordinate is different. The regression is 
not applied to the lowest curve in (b) since e = 0.01 is sufficiently large that 
several matrices deflate instantaneously. 
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Figure 14: Standard deviation of deflation time with the Wilkinson 
shift, (a) Wigner class ensembles, (b) JUE and UDSJ. Line-styles are as 
in Figure 13 with a regression of the form (37) 
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Figure 15: Empirical distributions of the deflation index t njt for the 
unshifted QR algorithm. The figures show histograms of the frequency 
with which deflation occurs at a given offdiagonal index. To aid visibility 
we have centered the distribution so that the peaks do not overlap. The 
off-diagonal index takes values between and n — 2. Here a,b and c refer to 
ensembles with n = 190, 130 and 70 respectively. The ensembles shown are 
(a) Hermite-1; (b) GOE; (c) UDSJ; and (d) JUE. 
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Figure 16: Empirical distributions of the deflation index 



b n.c 



for 



the Toda algorithm. The figures show histograms of the frequency with 
which deflation occurs at a given offdiagonal index. The ensembles are as 
in Figure 15. 
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